AITopics | video prediction model

Collaborating Authors

video prediction model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d9042abf40782fbce28901c1c9c0e8d8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 10:09:16 GMT

machine learning, natural language, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

Neural Information Processing SystemsFeb-15-2026, 04:32:40 GMT

Neural Information Processing Systems http://nips.cc/

evaluation, prediction, video prediction, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

f490d0af974fedf90cb0f1edce8e3dd5-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 22:31:37 GMT

Deep learning has enabled algorithms to generate realistic images. However, accurately predicting long video sequences requires understanding long-term dependencies and remains an open challenge.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Video Prediction Models as Rewards for Reinforcement Learning

Neural Information Processing SystemsDec-26-2025, 22:19:24 GMT

Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning.A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet.

name change, reinforcement learning, video prediction model, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Clockwork Variational Autoencoders

Neural Information Processing SystemsDec-25-2025, 06:32:01 GMT

Deep learning has enabled algorithms to generate realistic images. However, accurately predicting long video sequences requires understanding long-term dependencies and remains an open challenge. While existing video prediction models succeed at generating sharp images, they tend to fail at accurately predicting far into the future. We introduce the Clockwork VAE (CW-VAE), a video prediction model that leverages a hierarchy of latent sequences, where higher levels tick at slower intervals. We demonstrate the benefits of both hierarchical latents and temporal abstraction on 4 diverse video prediction datasets with sequences of up to 1000 frames, where CW-VAE outperforms top video prediction models. Additionally, we propose a Minecraft benchmark for long-term video prediction. We conduct several experiments to gain insights into CW-VAE and confirm that slower levels learn to represent objects that change more slowly in the video, and faster levels learn to represent faster objects.

clockwork variational autoencoder, electronic proceedings, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

Add feedback

Scalable Policy Evaluation with Video World Models

Tseng, Wei-Cheng, Gu, Jinwei, Zhang, Qinsheng, Mao, Hanzi, Liu, Ming-Yu, Shkurti, Florian, Yen-Chen, Lin

arXiv.org Artificial IntelligenceDec-5-2025

Training generalist policies for robotic manipulation has shown great promise, as they enable language-conditioned, multi-task behaviors across diverse scenarios. However, evaluating these policies remains difficult because real-world testing is expensive, time-consuming, and labor-intensive. It also requires frequent environment resets and carries safety risks when deploying unproven policies on physical robots. Manually creating and populating simulation environments with assets for robotic manipulation has not addressed these issues, primarily due to the significant engineering effort required and the substantial sim-to-real gap, both in terms of physics and rendering. In this paper, we explore the use of action-conditional video generation models as a scalable way to learn world models for policy evaluation. We demonstrate how to incorporate action conditioning into existing pre-trained video generation models. This allows leveraging internet-scale in-the-wild online videos during the pre-training stage and alleviates the need for a large dataset of paired video-action data, which is expensive to collect for robotic manipulation. Our paper examines the effect of dataset diversity, pre-trained weights, and common failure cases for the proposed evaluation pipeline. Our experiments demonstrate that across various metrics, including policy ranking and the correlation between actual policy values and predicted policy values, these models offer a promising approach for evaluating policies without requiring real-world interactions.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.1152

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.62)

Add feedback

Video Prediction of Dynamic Physical Simulations With Pixel-Space Spatiotemporal Transformers

Slack, Dean L, Hudson, G Thomas, Winterbottom, Thomas, Moubayed, Noura Al

arXiv.org Artificial IntelligenceOct-24-2025

Personal use of this material is permitted. Abstract--Inspired by the performance and scalability of autoregressive large language models, transformer-based models have seen recent success in the visual domain. This study investigates a transformer adaptation for video prediction with a simple end-to-end approach, comparing various spatiotemporal self-attention layouts. Focusing on causal modelling of physical simulations over time; a common shortcoming of existing video-generative approaches, we attempt to isolate spatiotemporal reasoning via physical object tracking metrics and unsupervised training on physical simulation datasets. We introduce a simple yet effective pure transformer model for autoregressive video prediction, utilising continuous pixel-space representations for video prediction. Without the need for complex training strategies or latent feature-learning components, our approach significantly extends the time horizon for physically accurate predictions by up to 50% when compared with existing latent-space approaches, while maintaining comparable performance on common video quality metrics. Additionally, we conduct interpretability experiments to identify network regions that encode information useful to perform accurate estimations of PDE simulation parameters via probing models, and find this generalises to the estimation of out-of-distribution simulation parameters. This work serves as a platform for further attention-based spatiotemporal modelling of videos via a simple, parameter-efficient, and interpretable approach. ECENT progress in the development of transformer [1] based generative models, particularly text-generative models in Natural Language Processing (NLP), have led to increased efforts to extend their application beyond the linguistic domain [2, 3, 4]. Building on the success of generative modelling in the image domain, such as V ariational Autoencoders (V AEs) [5] and Diffusion models [6], recent advances have extended to generative modelling of videos. This is becoming an area of increasing research, focusing on the development of novel architectures and techniques for model interpretability [7, 4, 8].

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNNLS.2025.3585949

2510.20807

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Video Prediction Models as Rewards for Reinforcement Learning Alejandro Escontrela Ademi Adeniji Wilson Y an

Neural Information Processing SystemsOct-10-2025, 23:47:04 GMT

Specifying reward signals that allow agents to learn complex behaviors is a longstanding challenge in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

Neural Information Processing SystemsAug-20-2025, 09:44:20 GMT

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time.

evaluation, prediction, video prediction, (12 more...)

Neural Information Processing Systems

Country: